{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Inclusive ML - Understanding Bias\n",
    "\n",
    "**Learning Objectives**\n",
    "\n",
    "1. Invoke the What-if Tool against a deployed Model\n",
    "1. Explore attributes of the dataset\n",
    "1. Examine aspects of bias in model results\n",
    "1. Evaluate how the What-if Tool provides suggestions to remediate bias\n",
    "\n",
    "\n",
    "## Introduction \n",
    "\n",
    "This notebook shows use of the [What-If Tool](https://pair-code.github.io/what-if-tool) inside of a Jupyter notebook.  The What-If Tool, among many other things, allows you to explore the impacts of Fairness in model design and deployment.\n",
    "\n",
    "The notebook invokes a previously deployed XGBoost classifier model on the [UCI census dataset](https://archive.ics.uci.edu/ml/datasets/census+income) which predicts whether a person earns more than $50K based on their census information.\n",
    "\n",
    "You will then visualize the results of the trained classifier on test data using the What-If Tool.  \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, you will import various libaries and settings that are required to complete the lab."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip freeze | grep httplib2==0.18.1 || pip install httplib2==0.18.1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import witwidget\n",
    "from witwidget.notebook.visualization import (\n",
    "    WitWidget, \n",
    "    WitConfigBuilder,\n",
    ")\n",
    "pd.options.display.max_columns = 50"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "PROJECT = !(gcloud config list --format=\"value(core.project)\")\n",
    "PROJECT = PROJECT[0]\n",
    "\n",
    "BUCKET = \"gs://{project}\".format(project=PROJECT)\n",
    "MODEL = 'xgboost_model'\n",
    "VERSION = 'v1'\n",
    "MODEL_DIR = os.path.join(BUCKET, MODEL)\n",
    "\n",
    "\n",
    "os.environ['PROJECT'] = PROJECT\n",
    "os.environ['BUCKET'] = BUCKET\n",
    "os.environ['MODEL'] = MODEL\n",
    "os.environ['VERSION'] = VERSION\n",
    "os.environ['MODEL_DIR'] = MODEL_DIR"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Set up the notebook environment\n",
    "\n",
    "First you must perform a few environment and project configuration steps.  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "These steps may take __8 to 10 minutes__, please wait until you see the following response before proceeding:\n",
    "\n",
    "\"__Creating version (this might take a few minutes)......done.__\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "\n",
    "gcloud config set project $PROJECT\n",
    "\n",
    "gsutil mb $BUCKET\n",
    "gsutil cp gs://cloud-training-demos/mlfairness/model.bst $MODEL_DIR/model.bst\n",
    "\n",
    "gcloud ai-platform models list | grep $MODEL || gcloud ai-platform models create $MODEL\n",
    "\n",
    "gcloud ai-platform versions list --model $MODEL | grep $VERSION ||\n",
    "gcloud ai-platform versions create $VERSION \\\n",
    "  --model=$MODEL \\\n",
    "  --framework='XGBOOST' \\\n",
    "  --runtime-version=1.14 \\\n",
    "  --origin=$MODEL_DIR \\\n",
    "  --python-version=3.5 \\\n",
    "  --project=$PROJECT"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, download the data and arrays needed to use the What-if Tool."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "\n",
    "gsutil cp gs://cloud-training-demos/mlfairness/income.pkl .\n",
    "gsutil cp gs://cloud-training-demos/mlfairness/x_test.npy .\n",
    "gsutil cp gs://cloud-training-demos/mlfairness/y_test.npy .  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "features = pd.read_pickle('income.pkl')\n",
    "x_test = np.load('x_test.npy')\n",
    "y_test = np.load('y_test.npy')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now take a quick look at the data.  The ML model type used for this analysis is XGBoost.  XGBoost is a machine learning framework that uses decision trees and gradient boosting to build predictive models. It works by ensembling multiple decision trees together based on the score associated with different leaf nodes in a tree. \n",
    "\n",
    "XGBoost requires all values to be numeric so the orginial dataset was slightly modified.  The biggest change made was to assign a numeric value to Sex.  The originial dataset only had the values \"Female\" and \"Male\" for Sex.  The decision was made to assign the value \"1\" to Female and \"2\" to Male. As part of the data prepartion effort the Pandas function \"get_dummies\" was used to convert the remaining domain values into numerical equivalent.  For instance the \"Education\" column was turned into several sub-columns named after the value in the column.  For instance the \"Education_HS-grad\" has a value of \"1\" for when that was the orginial categorical value and a value of \"0\" for other cateogries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "features.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To connect the What-if Tool to an AI Platform model, you need to pass it a subset of your test examples.  The command below will create a Numpy array of 2000 from our test examples."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Combine the features and labels into one array for the What-if Tool\n",
    "\n",
    "num_wit_examples = 2000\n",
    "\n",
    "test_examples = np.hstack((\n",
    "    x_test[:num_wit_examples],\n",
    "    y_test[:num_wit_examples].reshape(-1, 1)\n",
    "))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Instantiating the What-if Tool is as simple as creating a WitConfigBuilder object and passing it the AI Platform model desired to be analyzed.\n",
    "\n",
    "The optional \"adjust_prediction\" parameter is used because the What-if Tool expects a list of scores for each class in our model (in this case 2). Since the model only returns a single value from 0 to 1, it must be transformed to the correct format in this function.   Lastly, the name 'income_prediction' is used as the ground truth label.\n",
    "\n",
    "It may take 1 to 2 minutes for the What-if Tool to load and render the visualization palette, please be patient."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO 1\n",
    "FEATURE_NAMES = features.columns.tolist() + ['income_prediction']\n",
    "\n",
    "\n",
    "def adjust(pred):\n",
    "    return [1 - pred, pred]\n",
    "\n",
    "\n",
    "config_builder = (\n",
    "    WitConfigBuilder(test_examples.tolist(), FEATURE_NAMES)\n",
    "    .set_ai_platform_model(PROJECT, MODEL, VERSION, adjust_prediction=adjust)\n",
    "    .set_target_feature('income_prediction')\n",
    "    .set_label_vocab(['low', 'high'])\n",
    ")\n",
    "\n",
    "WitWidget(config_builder, height=800)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## In the following steps we will use the What-if Tool to examine our model "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since the What-If Tool is embedded in the notebook you must use care when scrolling.  The What-if Tool has its own internal scrolling windows so you may need to reposition the frame window to reach the desired location.  To do this you must ensure you are at the far edges of the cell to scroll up and down as noted by the red oblong markers in the image below.\n",
    "\n",
    "Also, it might be helpful to widen the display frame in the notebook.  You can do this by dragging the vertical grey bar to the left (the place to click and hold is noted by the red arrows)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![InclusiveML](./assets/WIT_25.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The scenario for consideration is that this model, which is used to predict income levels, will be used in a loan approval process.   Your task is to determine its fitness for such a use case from a fairness perspective."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**TODO 2:** Your first action will be to examine data and its disribution along dimensions that are relevant to loan scoring.   The intial presentation in the tool shows all datapoints.  Blue dots are those individuals predicted as having incomes above 50k.  Red dots are those predicted as having incomes below 50k."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "On the \"Datapoint Editor\" tab, under \"Binning | X-Axis\" select \"Education_Masters\":"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![InclusiveML](./assets/BinX_Menu_Choice.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The results show that a majority of Masters degree holders, domain value \"1\" (in red above the visualization), have incomes above 50k."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![InclusiveML](./assets/BinX_Results.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Return to the \"Binning | X-Axis\" selector and choose \"None\" to reset the visualization."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**TODO 2:** Next navigate to the Features tab, here you can see the exact distribution of values for every feature in the dataset.  If you type \"sex\" into the filter box, you will see that of the 2,000 test datapoints, 670 from Women and 1,330 are from Men (as mentioned earlier the value \"1\" was assigned to Females and \"2\" was assigned to Males). The dataset reflects an imbalance between Females and Males with nearly double the number of cases that are Male. Women seem under-represented in this dataset."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![InclusiveML](./assets/Features.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**TODO 3:** On the \"Performance + Fairness\" tab, you can set an input feature (or set of features) by which to slice the data. This will allow you to evaluate the fairness of specific groups.  Income Prediction (corresponding to over or under 50k) has already been selected as the \"Ground Truth Feature\". On the \"Slice by\" selector, scroll to find and choose \"Sex\"."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![InclusiveML](./assets/PerfTab.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This selection allows you to see the breakdown of model performance on female datapoints versus male datapoints.  Even before you drill into the details you can see that the model has a lower F1 score for females than males. Drilling down on each value (by clicking the arrow beside the domain value) you see that the model predicts high income for females much less than it does for males:  3.4% of the time for females vs 18.9% of the time for males."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![InclusiveML](./assets/MaleDetails.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![InclusiveML](./assets/FemaleDetails.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the use-case scenario, the plan is to use this simple income classifier to approve or reject loan applications (not a realistic example but it illustrates the effect of bias in ML usage). In this case, 18.9% of men from the test dataset have their loans approved but only 3.4% of women have their loans approved. If you wanted to ensure than men and women get their loans approved the same percentage of the time, that is a fairness concept called \"demographic parity\". An XGBoost model intially defaults to a 0.50 threshold, which is what appears upon initial examination.  One way to achieve demographic parity would be to have different classification thresholds for females and males in our model. You'll notice there is a button on the tool labeled \"demographic parity\". When you press this button, the tool will take the cost ratio into account, and come up with ideal separate thresholds for men and women that will achieve demographic parity over the test dataset."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**TODO 4:** On the \"Performance + Fairness\" tab, select \"Demographic Parity\" to see the results."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![InclusiveML](./assets/DemoParityButton_Resize.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By drilling down on each domain value you can see the automatic adjustments.  In this case, demographic parity can be found with both groups getting loans approved/predicting a high income ~17.7% of the time.  This occurs when the female threshold is set to 0.19 and the male threshold is set to 0.54. Because of the vast difference in the properties of the female and male training data in this 1994 census dataset, you need quite different thresholds to achieve demographic parity. With the high male threshold you may notice there are many more false negatives than before, and with the low female threshold there are many more false positives than before.  To reset the Peformance and Fairness Tab simply choose another domain value in the \"Slice by\" selector."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![InclusiveML](./assets/MaleDemoParitySlider.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![InclusiveML](./assets/FemaleDemoParitySlider.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The use of these features can help shed light on subsets of your data on which your classifier is performing very differently. Understanding biases in your datasets and data slices on which your model has disparate performance are very important parts of analyzing a model for fairness. There are many approaches to improving fairness, including augmenting training data, building fairness-related loss functions into your model training procedure, and post-training inference adjustments like those seen in WIT. The WIT provides a great interface for furthering ML fairness learning, but of course there is no silver bullet to improving ML fairness.\n",
    "\n",
    "Feel free to explore the What-if Tool and find additional insights."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<br>\n",
    "<br>\n",
    "<br>\n",
    "<br>\n",
    "Copyright 2018 Google Inc. Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License"
   ]
  }
 ],
 "metadata": {
  "environment": {
   "name": "tf2-gpu.2-1.m59",
   "type": "gcloud",
   "uri": "gcr.io/deeplearning-platform-release/tf2-gpu.2-1:m59"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
